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Abstract 



In this paper we introduce objective proper prior distributions for hypothesis testing and 
model selection based on measures of divergence between the competing models; we call 
them divergence based (DB) priors. DB priors have simple forms and desirable properties, 
like information (finite sample) consistency; often, they are similar to other existing proposals 
like the intrinsic priors; moreover, in normal linear models scenarios, they exactly reproduce 
Jeffreys- Zellner- Slow priors. Most importantly, in challenging scenarios such as irregular 
models and mixture models, the DB priors are well defined and very reasonable, while 
alternative proposals are not. We derive approximations to the DB priors as well as MCMC 
and asymptotic expressions for the associated Bayes factors. 

Keywords: Bayes factors; Information Consistency; Intrinsic priors; Irregular models; 
KuUback-Leibler divergence; Mixture models. 



1 Introduction 

For the data y, with density f{y \ 6, v), we consider the hypothesis testing problem: 



where Oq €z Q is a known value. This is equivalent to the model selection problem of choosing 
between models: 
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where the notation reflects the fact that often vi and V2 represent different quantities in each 
model. In Jeffreys' scenarios (Jeffreys, 1961), vi and V2 had the same meaning; he called 6 the 
new parameter, and Vi and 1/2, the common parameters (also known as nuisance parameters). 
We revisit this issue in Section 4. 

We aim for an objective Bayes solution to this model selection problem; that is, no 'external' 
(subjective) information is assumed, other than the data, y, and the information implicitly 
needed to pose the problem, choose the competing models, etc. An excellent exposition of the 
advantages of Bayesian methods, specially objective Bayes methods, for problems with model 
uncertainty is Berger and Pericchi (2001). 

Usual Bayesian solutions (for O-fcj loss functions) to ([1]) (or, equivalently, to ([2])) are based 
on the posterior odds: 

Pr(gi [ y) _ Pr(gi) 

Vt{H2 I y) Pr(F2) ' 

where Pr(iJj), i = 1, 2 are the prior probabilities of the hypotheses, and B12 is Bayes Factor for 
Hi against H2: 

= "^^''^^ = / fijy I t^i)7ri(t/i)(it/i 

m2iy) J f2{y \ d,U2)TT2{0,V2)dedV2 

where 711(1^1) is the prior under Hi and Tr2{0,v>) the prior under H2- That is, B12 is the ratio of 
the marginal (averaged) likelihoods of the models. 

It is common practice in objective Bayes approaches to concentrate on derivations of the 
Bayes factors, letting the ultimate choice (whether objective or subjective) of the prior model 
probabilities (and the derivations of the posterior odds) to the user. Bayes factors were exten- 
sively used by Jeffreys (1961) as a measure of evidence in favor of a model (see also Berger, 1985; 
Berger and Delampady, 1987, and Berger and Sellke, 1987); Kass and Raftery (1995) is a good 
reference for review and applications. Bayes factors are also crucial ingredients of model averag- 
ing approaches (see Clyde, 1999; Hoeting et al, 1999). In the rest of the paper, we concentrate 
on the derivation of objective priors to compute Bayes factors. 

A main issue for deriving objective Bayes factors is appropriate choice of tti{v>i) and vr2(0, V2) 
for use in ([3]). It is well known that familiar improper objective priors (or non- informative priors) 
for estimation problems (under a fixed model) are usually seriously inadequate in the presence 
of model uncertainty, generally producing arbitrary answers. (Interesting exceptions are studied 
in Berger, Pericchi and Varshavsky, 1998.) Of course, when improper priors can not be used, 
use of arbitrarily vague (but proper) priors is not a cure, and generally it is even worse. Another 
bad solution often encountered in practice is use of an apparently 'innocuous', harmless, but 
yet arbitrary, proper prior, since it can severely dominate the likelihood in ways that are not 
anticipated (and can not be investigated for high dimensional problems). 

There are two basic approaches to compute Bayes factors when there is not enough informa- 
tion available for trustworthy subjective assessment of vri(^'i) and Tr2{0,i'2) ■ A very successful 
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one is to directly derive the objective Bayes factors themselves, usually by 'training' and calibrat- 
ing in several ways the non-appropriate Bayes factors obtained from usual objective improper 
priors (see Berger and Pericchi, 2001 for reviews and references). However, all these objective 
Bayes factors should ultimately be checked to correspond (approximately) to a genuine Bayes 
factor derived from a sensible prior. The alternative approach is to look for 'formal rules' for 
constructing 'objective' but proper priors that have nice properties and are appropriate for us- 
ing in model selection; Bayes factors are then just computed from these objective proper priors. 
Whether these Bayes factors are appropriate can then be directly judged from the adequacy of 
the priors used. 

Choice of prior distributions in scenarios of model uncertainty is still largely an open ques- 
tion, and only partial answers are known. Several methods have been proposed for use in general 
scenarios, like the arithmetic intrinsic (AI) priors (Berger and Pericchi, 1996; Moreno, Bertolino 
and Racugno, 1998); the fractional intrinsic (FI) priors (De Santis and Spezaferri, 1999; Berger 
and Mortera, 1999); the expected posterior (EP) priors (Perez and Berger, 2002); the unit in- 
formation priors (Kass and Wasserman, 1995) and predictively matched priors (Ibrahim and 
Laud, 1994; Laud and Ibrahim, 1995; Berger, Pericchi and Varshavsky, 1998; Berger and Per- 
icchi, 2001). In the specific context of linear models, widely used prior with nice properties 
are Jeffreys-Zellner-Siow (JZS) priors (Jeffreys, 1961; Zellner and Slow, 1980,1984; Bayarri and 
Garci'a-Donato, 2007). An interesting generalization is the mixtures of ijr-priors (Liang et al., 
2007). 

All these methods are insightful, provide many interesting and useful ideas, and indeed have 
shown to behave nicely in a number of testing and model selection problems. Nonetheless, 
except for the very specific scenario of linear models, nobody seems to have investigated the 
ramifications of Jeffreys (1961) pioneering proposal (see the end of Section [2]). His was indeed 
the first general derivation of objective priors for hypothesis testing, and was intended as a 
generalization of his proposal for testing a normal mean. Given the success of the generalization 
of this Jeffrey's testing prior to linear models (Zellner and Slow, 1980,1984; Bayarri and Garci'a- 
Donato, 2007), it is somewhat surprising that his general proposal has not been pursued. We 
think that it is historically important to pursue this investigation, and we do so in this paper. 

Specifically, we generalize Jeffrey's pioneering suggestion, and use divergence measures be- 
tween the competing models to derive the required (proper) priors. We call these priors di- 
vergence based (DB) priors. The main motivation was to generalize the useful JZS priors for 
use in scenarios other than the normal linear model, while at the same time extending Jeffrey's 
general proposal. We will show that indeed the DB priors are the JZS priors in linear model 
contexts; also, they are as easy to derive (often easier) than other popular proposals (AI, FI 
or EP priors), being quite similar to them in many instances; most interestingly, they are well 
defined in certain scenarios where all of the other proposals fail. 
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For clarity of exposition, we consider first the case when there are no nuisance parame- 
ters. Development for the general case is delayed till Section HJ once the basic ideas have been 
introduced, and the behavior of DB priors studied in this considerably simpler scenario. 

2 DB priors 

Assume first the problem without nuisance parameters: 

M,: h{y) = f{y\eo) vs. M2 : f2iy \ 6) = f{y \ 6). (4) 

That is, the simpler model (Mi) involve no unknown parameters; hence only the prior for 9 
under M2 is needed. We drop the subindex in the previous section and denote such prior simply 
by Tr{6); clearly 7r{6) has to be proper. 

Our proposal for DB priors for 6 will be in terms of divergence measures between the com- 
peting models f{y \ Oq) and f{y \ 0), based on Kullback-Leibler directed divergences 

KLiOo -.6]= j [log f{y\9)- log f{y \ 6^)] f{y\e)dy, (5) 

(assuming continuous y for simplicity) . KL is a measure of the information in y to discriminate 
between 6 and 6q] it is designed to measure how far apart the two competing densities are in 
the sense of the likelihood (Schervish, 1995). 

We do not directly use KL to define the DB prior because it is not symmetric with respect 
to its arguments, and hence it would likely result in nonsymmetric priors; however, symmetric 
measures of divergence can be derived by taking sums (which was Jeffrey's choice) or minimums 
of KL divergences. We define: 

D^[e, 0o] = KL\e : 0o] + KL[0^ : 6>], (6) 

and 

D^'^[e, 0o] = 2 X m.\n{KL[e : eo],KL[eo : 0]}. (7) 

We multiply by 2 the minimum in the definition of D^^ so that both measures are in the 
same scale; indeed, in some symmetric models (like in the normal scenario) both measures of 
divergence coincide. Generalizations of KL, and to include marginal parameters are 
discussed in Sectional Note that D^^ is well defined even when one of directed KL divergences 
is not, which is the case when the competing models have different support. Except for these 
irregular scenarios, is well defined and it is considerably easier to derive than D*^. Most 
of the derivations and properties to follow are common to both and D^^ . To avoid tedious 
repetitions, we then simply use D to refer to anyone of them. We use the superindex 5 or M 
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only when necessary. 

It is well known that D > with equality if and only ifO = Oq, although it is not a metric (the 
triangle inequality does not hold). Our proposal, is based on unitary measures of divergence, D, 
which we take to be D divided by the effective sam,ple size n*, D = D/n*. In simple univariate 
i.i.d. data the effective sample equals the number of scalar data points, but it does not need to 
be so in general. Indeed, in complex situations, it can be a difficult concept; although there have 
been several attempts in the literature to formalize it (see e.g. Pauler, 1998; Pauler, Wakefield 
and Kass, 1999; Berger et al. 2007), no general agreed definition seems to exist. In all of the 
examples of this paper, it is quite clear what n* should be, so we rely for now in simple, intuitive 
interpretations. 

2.1 Motivation: scaleir location pcirameters 

Suppose y is a random sample from a univariate location family: 



It has been argued (Berger and Delampady 1987; Berger and Sellke 1987) that in symmetric 
problems with @ = 71, objective testing priors Tr{6) under H2 : ^ Oq should be unimodal and 
symmetric about ^o; these priors prevent introducing excessive bias toward H2. Accordingly, we 
look for a proper Tr{9) which, when in this simple scenario, has these desirable characteristics 
and which is easily generalizable to other situations. 

As before, let D he a unitary symmetrized divergence. We consider use of a function, h 
of l) as a testing prior under H2; that is tt{9) oc h{D[9,9Q]). Since tt has to be proper, h{t) 
has to be a decreasing (no-increasing) function for t > 0. A first possibility could be to take 
h{t) = exp{—qt} for some q > 0, but this results in priors with short tails. Short-tailed priors are 
usually not adequate for model selection, since they tend to exhibit undesirable (finite sample) 
inconsistent behavior (see Liang et al 2007). 

We explore instead use of the functions hq{t) = {1 + where q > controls thickness of 
the tails of Tr{9). Let 



n n 



fiy\d) = llf(y^\^^ = Il9iyi-^)^ ^e7^. 



1=1 1=1 




and define 



q = inf{g > : c{q) < 00}, 



q*=q + 1/2. 
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For finite q, our specific proposal for a DB prior in this location problem is 

7r^(0) = c{q.)-' {l + oc V {D[0, 9,]) . (8) 

Generalization to vector valued 9 is trivial. 

We use instead of the more natural q because q is not guaranteed to produce proper priors. 
Of course, if q is finite, any q = q + 5, with 5 > results in proper priors, and hence could have 
been used to define a DB prior. Our specific proposal, 5 = 1/2 was chosen to reproduce the 
well known Jeffreys-Zellner-Siow prior in the Normal context; in general this choice results in 
densities with heavy tails. Moreover, we have found that in general < 5 < 1 is a good 
choice since it produces priors without moments, which in normal scenarios is needed to avoid 
undesirable behavior of conjugate g priors (Liang et al, 2007). 

The following lemma establishes the desired symmetry and unimodality of the DB prior. 
The proof follows easily from properties of D in these location problems and is avoided. 

Lemma 2.1. Assume q < oo; then it^{9) is unimodal and symmetric around 9q. 

Definition of DB priors for scale parameters is also direct. Indeed assume that is a scale 
parameter for a positive random variable X] then, ^ = logO is a location parameter for Y = 
logX, with density f*{y \ ^). Applying the definition in ([8]), the DB prior for ^ is: 

7r^(0oc Vp*[C,eo]), (9) 

where = log(^o) and D*[£_^^q\ is the unitary measure of divergence between f*{y \ ,^o) and 
f*{y I 0- Therefore, in the original parameterization: 

7r^(0) oc V {D* [log 9, log 00])^ = V {D[9, 9^]) 7r^(e), (10) 

where, because of invariance of D under reparameterizations, I)*[log0,log0o] = D\9,9q\, and 
TT^ {9) = 1/6 is the non informative prior (right Haar invariant prior) for 9. Definition of DB 
priors for general parameters, formalized in next section, will basically be a generalization of 

2.2 General parameters 

Assume the more general problem and let 7r^{6) be an objective (usually improper) 'estima- 
tion' prior (reference, invariant, Jeffreys, Uniform, ... prior) for 6, and let ^ be a transformation 
such that vr^(^) = 1 for ^ = ^{0). We can then derive a DB prior for 6 by considering ^ as a 
"location parameter", applying the definition ([8|), and transforming back to 0. This transfor- 
mation was first proposed by Jeffreys (1961). Bernardo (2005) uses it with a reference prior tt^ 
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for a scalar 6, and notes that ^ asymptotically behaves as a location parameter. 
Giving ^ a DB prior for location parameters results in: 

^^(Ooc^,P*K,^o]), (11) 

where, as before, -D*[^,^o] denotes 'unit' (symmetrized) discrepancy between f*{y | ^) and 
f*{y I ^o)) s-^d ^0 = C(^o)- Hence, the corresponding (DB) prior for 6 is 

7r^(0) oc hg,iD*[m,^iOo)]) \Je{e)\ « Vp[0,0o]) vr^W, (12) 

as long as vr^ is invariant under transformations; J^{9) is the jacobian of the transformation. It 
should be noted from (jl2p that the explicit transformation to ^ is not needed in order to derive 
the prior tt^. We can now formally define a DB prior as follows: 

Definition 2.1. (General DB priors) For the model selection problem let D[9,Oo] be a 
unitary measure of divergence between f{y \ 6) and f{y \ 6q). Also let [0) be an objective 
(possibly improper) estimation prior for 6 under the complex model, M2, and hq{-) be a decreasing 
function. Define: 

q = mf{q>0: c(g) < 00}, q^=q + 1/2, 

where c{q) = / hq{D[0,6Q\)n^ {6)d6. If q* < 00, then a divergence based prior under M2 is 
defined as 

7T^{e) = c{q.)-'hq,{D[e,eo]) 7T^{e). (13) 

Note that, by definition, the DB priors either do not exist, or they are proper (and hence 
they do not involve arbitrary constants). 

Specific Proposals. Definition 12. II is very general, in that several definitions of D, hq and 
could be explored (as well as different choices of < 5 < 1 in = q + S). We give specific choices 
which, in part, are based on previous explorations and desired properties of the resulting vr^; 
however our specific choices are mainly intended to reproduce JZS priors in normal scenarios, 
so that our proposals for DB priors can be best contemplated as extensions of JZS priors to 
non-normal scenarios. 

In what follows, we take D to be either in ([6]) or D*^ in ([7]), and hq{t) = (1 + t)^'^. Since 
we will explore both, we need different notations: 

Definition 2.2. (Sum and Minimum DB priors) The sum DB prior tt^ and the minimum 
DB prior vr^^ are the DB priors given in definition \2.1\ with hq{t) = (1 + t)~'^ and D being 
respectively (see and (see ©J. When needed, we refer to their corresponding c's 
and q's as cs,q^,q^ , and CM,q'^\qt^ , respectively. 
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It can easily be shown that cs{q) < cm{q), so that, for regular problems (in which D < oo), 
q^'^ < CO implies q'l < oo, and therefore, in these problems, existence of vr*^ implies existence of 

It should be noted that, although we are not explicitly assuming a specific objective prior 
TT^ in the definition of DB priors, properties of vr^ are inherited by the DB prior vr^; some 
properties will be crucial for sensible DB priors, and hence appropriate choice of vr^ becomes 
very important. 

We now explore some appealing properties of DB priors. Since these are common to both 
proposals in Definition \2.2\ we drop unneeded super and sub indexes and refer to the prior 
simply as tt^ . This convention will be kept through the paper; distinction between tt^ and vr*^ 
will only be done when needed. 

Local behavior of DB priors. It can be easily checked that, when tt^ {9) = 1 (as when is 
a location parameter), then the mode of vr^ is Oq (so vr^ is 'centered' at the simplest model). We 
can also exploit the following (well known) approximate relationship between Kullback-Leibler 
divergence and Fisher information (see Kullback, 1968): for is in a neighborhood of 6q 

KL[eo,e] ^ ^{e - OoYj{eo){e - eo), 

where J{Oq) is the expected Fisher information matrix evaluated at 6q. Hence, in a neighborhood 
of Oq, the DB priors approximately behave as k multivariate Student distributions, centered at 
6q, and scaled by Fisher information matrix under the simpler model. That is, 

7r^(0)«iStfc(0o,n*J(0o)"Vd, d), 

where d = 2q — k + 1. Moreover, by definition of g*, d above is generally close to 1, and then 
the DB priors would approximately be Cauchy. 

As highlighted in Section [4.3.2^ the approximation above exactly holds in Normal scenarios 
with d = 1, and hence the DB priors reproduce precisely the proposals of Jeffreys-Zellner-Siow. 

Invariance under one-to-one transformations An important question is whether the DB 
priors are invariant under reparameterizations of the problem. Suppose that ^ = (,{9) is a 
one-to-one monotone mapping ^ : Q ^ The model selection problem Q now becomes: 

Mr : n{y) = riy I ^o) vs. : /^{y \ ^) = fiy \ 0, (14) 

where f*{y \ ${0)) = f{y \ 0) and = ^(^o)- The next result shows that, if is invariant 
under the reparameterization $,{6) then so are the DB priors. 
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Proposition 1. Let vr^(0) and tt^{^) denote the DB priors for the original and reparame- 
terized [14\ ) problems respectively. If (6) oc {${0))\J'^{6)\, where is the Jacobian of the 
transformation then 

Proof. See Appendix. □ 

Under the conditions of Proposition [H Bayes factors computed from DB priors are not 
affected by reparameterizations. It is important to note that invariance of DB priors is a direct 
consequence of both the invariance of the divergence measure used and the invariance of vr^. 
Some objective priors invariant under reparameterizations are Jeffreys' priors and (partiahy) 
the reference priors. 



Compatibility with sufficient statistics. DB priors are sometimes compatible with reduc- 
tion of the data via sufficient statistics. This attractive property is not shared by other objective 
Bayesian methods, as intrinsic Bayes factors. 

Proposition 2. Let t = t{y) be a sufficient statistic for in f(y \ 0) with distribution f*{t \ 6). 
Assume that tt^ and n* remain the same in the problem defined by /*, then the DB prior tt^ 
for the original problem ^ is the same as the DB prior for the reduced (by sufficienty) testing 
problem 

Ml: fl{t)= f*{t\eo) vs. M^:f^{t\e)=f*{t\e). (15) 
Proof. See Appendix. □ 



DB priors and Jeffreys' general rule. Jeffreys (1961) proposed objective proper priors for 
testing situations other than the normal mean. Specifically, when y is a random sample of size 
n, and for univariate 9 he proposed the following model testing prior: 



-K da \ n J vrV n J do \ n J 



This reduces to Jeffreys Cauchy proposal when is a normal mean. Also, when |0 — ^ol is 
small, TT'^ (9) can be approximated by 



7r'i9)^-{l + D^[9,9o])-'7T''-'i9), (17) 



TT 



where tt^'^ {9) is Jeffreys' (estimation) prior (i.e. the squared root of the expected Fisher infor- 
mation). 
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Note that vr can lead to improper priors and at least in principle can not be applied for mul- 
tivariate parameters. However, the approximation (jl7p was a main inspiration for the definition 
of DB priors, with clear similarities between them. 

3 Comparative examples: simple null 

In the spirit of Berger and Pericchi (2001) we investigate in this section the performance of DB 
priors in a series of situations chosen to be somehow representative of wider classes of statistical 
problems. We also explicitly derive well established, alternative proposals for objective priors 
in Bayesian hypothesis testing and compare their performance with that of DB priors. We 
show that in simple standard situations, DB priors produce similar results to these alternative 
proposals. More interestingly, in more sophisticated situations where these proposals fail (models 
with irregular asymptotics or improper likelihoods), the DB priors are well defined and very 
sensible. 

We will compute and compare Bayes factors derived with DB priors, with those derived with 
two of the most popular general objective priors for objective Bayes model selection, namely: 

1. Arithmetic intrinsic prior: 

where the Bayes factor is computed with the objective estimation prior vr^, and y* is 
an imaginary sample of minimum size such that < m2 {y*) < oo. 

2. Fractional intrinsic prior: 

Ffn^ jv,^, expjm^g^^^ log f{y I Op)} 

vr ((/) = vr (tf) TT — -. 

/ exp{m^fMog/(2/ I 6>)}7r^(6>)d6> 

In the iid case and asymptotically, ir^ produces the arithmetic intrinsic Bayes factor (Berger 
and Pericchi, 1996), and vr^ the fractional Bayes factor (O'Hagan, 1995) if the exponent of 
the likelihood is b = m/n for a fixed m (see De Santis and Spezaferri, 1999). Following the 
recommendation of Berger and Pericchi (2001) we take m to be the size of the minimal training 
sample y* . 

In the examples of this Section, y is an iid sample of size n from f{y \ 9), and unless otherwise 
specified, n* = n (n* denotes effective sample size). We let B12 denote the Bayes factor in favor 
of Hi computed with vr'^ (see Definition 12. 2p : I? *2 1-^12 -^12 ^^'^ defined similarly. 
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3.1 Bounded parameter space (Example 1) 

We begin with a simple example, in which data is a random sample from a Bernoulli distribution, 
that is 

f{y\e) = ey{i-eY-y, yG{o,i}, 0Ge = [o,i], 

and we want to test Mi : 9 = 9q versus Mi : 9 ^ 9q. The usual estimation objective prior 
(both reference and Jeffreys) in this problem is the beta density tt^ (9) = Be{9 j 1/2,1/2) oc 
^-1/2^]^ — 0)^^/^. In this case, since tt^ is proper, it would be tempting to use it as a testing 
prior. However, we will see that all vr'^ , vr*^ , vr^ and center around the null value 9q whereas 
the estimation prior completely ignores it. 

The DB prior for the sum-symmetrized divergence can be computed to be 



vr-(e)a l + (^-^o)log^^^^" ' 7r-{9), 



and the DB prior for the min-symmetrized divergence 



7r''\9)cx(l + D'''[9,9o]) '^'tt^W, 



where 

D''[9,9o] 



2 KL[9 : 9o] if min{0o, 1 - ^o} < ^ < max{0o, 1 - ^o} 
2KL[9o:9] otherwise. 



and KL[9 : ^o] = log ^ + (1 - ^o) log 

The intrinsic priors are derived in the next result. The proof is straightforward and hence it 
is omitted. 

Lemma 3.1. The arithmetic intrinsic prior is 

n^{9) = (1 - ^o)(l -0)+ 9o9) 7r^{9) 
and the fractional intrinsic prior is 

^ = \t{9 + 1/2)T{?./2-9)) " 

By construction, vr"^ and tt^'^ are proper priors; vr"^ is proper but vr^ is not. For instance, for 
9o = 1/2, vr^ integrates to 1.28 and for = 3/4, vr^ integrates to 1.18. This implies a small 
bias in the Bayes factor in favor of M2. In Figure [T] we display vr'^, vr*^, vr"^ and for 9q = 1/2 
and ^0 = 3/4. They can be seen to be very similar. When 9q = 1/2 they are also similar to the 
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' ' ' . Theta I ' ' ' ' ' 

0.2 0.4 0.6 0.8 1 0.2 0.4 0.6 0.8 1 



Figure 1: In Bernoulli example: vr'^ (Solid line), tt*^ (Dot-dashed line), vr^ (Dots) and 
(Dashed line), for the case Oq = 1/2 (left) and = 3/4 (right). 
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19.38 


20.20 


20.79 


16.02 



Table 1: Bayes factors in favor of Mi for Bernoulli testing of = 1/2, for different values of 
the MLE and n = 10, n = 100. Also, Bayes factors for Conover data. 



objective estimation prior Be{9 \ 1/2, 1/2), but not for other values of ^o- 

We also compute the Bayes factors for the four different priors, when = 1/2, for two 
different sample sizes, n = 10 and n = 100, and for different values of the MLE, 9 = "^l^i Ui/n 
(see Table [T]). All the results are quite similar. As expected, i?^ gives the most support to M2; 
Byi gives the least. Both DB priors produce similar results, being slightly closer to By^, than to 

^12- 

Finally, we consider application to real data taken from Conover (1971). Under the hy- 
pothesis of simple Mendelian inheritance, a cross between two particular plants produces, in a 
proportion oi 6 = 3/4 a specie called 'giant'. To determine whether this assumption is true, 
Conover (1971) crossed n = 925 pair of plants, getting T = 682 giant plants. The Bayes factors 
in favor of the Mendelian inheritance hypothesis (simplest model) are also given in Table [J 
for the four different priors. Again the results are very similar, the fractional intrinsic prior 
providing the least support to Mi. 
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Figure 2: vr"^ (upper left), tt^ (upper right), tt^ (lower left) and vr^ (lower right) for the 
Exponential testing of /io = 5. 



3.2 Scale parameter (Example 2) 

We next consider another simple example of testing a scale parameter. Specifically, we consider 
that data come from the one parameter exponential model with mean that is, 

11 y 
f{y I At) = Exp {y\-) = - exp{ }, y > 0, /i > 0, 

and that it is desired to test Hi : fi = vs. H2 : /i 7^ ^o- Here 7r^{fi) = fj,~^, and the DB 
priors are computed to be: 



where 



2KL[fj,o:fj] if // > //q 
2KL[fi:no] if < fJ-o, 



and KL[/i : /io] = log(/zo/^) — {fJ-o — /")//^o- The intrinsic priors are given in the next lemma 
(the proof is straightforward and is omitted) : 

Lemma 3.2. The arithmetic and fractional intrinsic priors are 

7r^(Ai) = fiQ-H^ + ^^{y) = ^^o~' exp{-^} = Expif, \ -}. 

The four priors are shown in Figure [2] when testing //q = 5. They all have similar shapes, 
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n = 10 /t 



5 

7.5 
2.5 



5.65 
2.36 
0.95 



4.43 
2.02 
0.88 



5.13 
2.09 
0.82 



3.59 
1.58 
0.59 



n = 100 



5 17.28 
7.5 14.6 X 10""* 
2.5 0.86 x 10"'^ 



12.81 
12.2 X lO-'^ 
0.83 X 10-'^ 



15.98 
13 X 10"* 
0.73 X 10^^ 



10.89 

9.4 X 10-"* 
0.54 X lO"'^ 



Table 2: Bayes factors for the exponential testing with ^uq = 5 for different values of the MLE 
and n = 10, n = 100. 

although that of vr^ is somehow innusual; they have some interesting properties: 

1. In the log scale, both tt^'^ and vr'^ are symmetric around log^oj this is in accordance to 
Berger and Delampady (1987) and Berger and Sellke (1987) proposals, since log(//) is a 
location parameter. 

2. All four priors are proper. 

3. Neither the arithmetic intrinsic nor the DB priors have moments; the arithmetic fractional 
has all the moments. 

4. vr*^ has the heaviest tails, and tt^ the thinnest, vr'^ has heavier tails than tt^ 

5. All four priors are 'centered' at the null value //q! indeed, is the median of the DB priors 
and of TT^, and it is the mean of vr^. 

The four Bayes factors B12 in favour of Mi : /i = 5 appear in Table [21 for two values of 
n (n = 10 and n = 100) and some few values of the MLE fi = Yll=iyi/^ ^ {5,7.5,2.5}. We 
again find very similar results for the different priors, with ™d ^12 providing slightly more 
support to Ml than and i?^ when data is compatible with Mi. 

We next investigate a desirable property of Bayes factors which often fails when they are 
computed using conjugate priors (see Berger and Pericchi, 2001). It is natural to expect that, 
for any given sample size, B12 — > as the evidence against the simpler model Mi becomes 
overwhelming. When this property holds, we say that the Bayes factor is evidence consistent (or 
finite sample consistent). It is easy to show that, if y ^ 00 then B12 Vn, no matter what 
prior is used to obtain the Bayes factor. The following lemma provides sufficient conditions for 
B12 —>■ as y ^ 0. 

Lemma 3.3. Let Bf2 be the Bayes factor computed with 7r(/i). Bf2 — > as y — s- 0, for all 
n > k > if and only if 
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Figure 3: Upper bounds B^2{''^t'^) of Bayes factors as a function of n for the priors vr (Solid 
line), TT^^ (Dot-dashed line), vr^ (Dashed Line), and tt"^ (Dots). 



It follows that all four priors considered produce evidence consistent Bayes factors for all 
n > 1. Evidence consistency provides further insight into the behaviour of the DB priors. 
Indeed, we recall that in the general definition of DB priors we used the power q + 5, and then 
we recommended the specific choice 5 = .5 . Interestingly, if (5 > 1 is used instead, then vr"^ 
would not be evidence consistent as y ^ 0. 

Last, we study the behavior of B12 as the evidence in favor of Mi grows (that is, as y ^ hq). 
For this example, it is easy to show that, when y fiQ, B12 grows to a constant, Bi2{n,TT) 
say, that depends only on n and the prior used. Of course, it then follows from the dominated 
convergence theorem that Bi2{'n,'iT) 00 with n, but this also follows from general consistency 
of Bayes factors (for proper, fix priors), so it is not very interesting. Of more interest for our 
comparison is to study how fast Bi2{n,T:2) goes to 00. In Figure [3] we show B°2(n.,7r) for the 
four priors considered. It can be seen that vr"^ is the one producing the largest values of B12 for 
all values of n, with those for following very closely. 

3.3 Location-scale (Example 3) 

DB priors are defined in general for vector parameters 0. As an illustration, we next consider 
a most popular example, namely the normal distribution; here the 2-dimensional 6 has two 
components of different nature (location and scale). Specifically, assume that 



and that we want to test Mi : (//,a") = (/iO)0"o) versus M2 : (/u,o') 7^ (/io^co)- This hypothesis 
testing problem occurs often in statistical process control, where a production process is con- 
sidered 'in control' if its production outputs have a specified mean and standard deviation (the 
so called nominal values); the question of interest is whether the process is in control, that is, 



Proof. See Appendix. 



□ 



f{y I M,o") = N{y I fi,a^) 
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Figure 4: tt for the Normal problem, with /xq = 0, do = 1. 



whether the mean and variance are equal to the nominal values. 

To compute the DB priors we use the reference prior vr^(^, a) = cr^^; for the sum-DB prior 
we get: 

TT^{^, a) = TT^{a) TT^ifx I fj), TT^ia) oc 



4 I 4 



(a4 + a4)l/2(a2 + a2)l/2' 

and 

^^(/^ I c^) = Ca(/x I /xo,S), : ^ , 

where Ca represents the Cauchy density. In this example, the minimum-DB prior vr^ does not 
exist, since g*^ = oo. It can be checked that vr'^(/x | a) is symmetric around /xq, which is a 
location parameter in 7r^{n \ a); do is a scale parameter in 7r''^{a). The joint density vr"^ is shown 
in Figure m 

The intrinsic priors, which have simpler forms and thinner tails, are derived next (the proof 
is omitted): 

Lemma 3.4. The arithmetic intrinsic prior is 



and the fractional intrinsic prior is 

7r^(/.,a) = iV+(a|0,^) iV(^|;xo,^), 

where stands for the normal density truncated to the positive real line. 

The intrinsic priors are proper; also, as with the sum-DB prior, fiQ and do are location and 
scale parameters for /i | cr and a respectively. Under the fractional intrinsic prior tt^, // and a 
are independent a priori. 

Values of B12 for all three priors and different values of the sufficient statistic (y, S) are 
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B12 


B12 




Bt2 




B12 




B12 


S-- 


= 0.5 


2.30 


1.35 


0.70 


0.03 


0.02 


0.01 


3 


■ 10"* 


4 


■ 10"* 


6 


■ IQ-^ 


s 


= 1 


18.67 


18.55 


11.72 


0.21 


0.19 


0.18 


1 


•10-^ 


2 


•10-^ 


6 


•10-^ 


s 


= 2 


0.006 


0.006 


0.017 


5- 10"^ 


5 ■ 10"^ 


21 • 10"^ 


2 


10-" 


2 


10-" 


41 


■ 10-" 



Table 3: For multidimensional parameter problem (/ig = 0, do = 1), values of B12 for different 
values of {y, S) with n = 10. 
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Figure 5: Marginal distributions of a when {no,ao) = (0,1); 7r2(o") (solid line), vr^(o") (dots), 
and (c) (dashed line). The pair (mode, median) for these priors are (0.81,1.56) for vr^, (0,1) 
for vr^, and (0,0.48) for vr^. 
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Figure 6: Conditional distributions of [i given a 
(0,1); vr-^ (solid), vr^ (dots), and vr^ (dashed). 



1 (left) and cj = 3 (right) when (/ioi^o) 



given in Table [3] when (/Lio, ctq) = (0, 1). The Bayes factors corresponding to the different priors 
can be seen to be quite similar, specially, once again, ^-i^d i?^. 

For the three priors, we display in Figure [5] the marginal distributions of a and in Figure [H 
the conditional distributions of // given o. It can clearly be seen that vr^(cj) has thinner tails 
than ^2 (I'ecall, thicker tails seem to perform better for testing). Also, all conditional 

priors for /i are symmetric around their mode /xqi with 7r'^(/i | a) having the heaviest tails. 

With respect to the evidence consistency of the Bayes factors, it is easy to show that when 
either y 00, y —00 or 5* ^ 00 (the evidence against M\ is very strong), then Byi — > 0, 



17 



200 
175 
150 
125 
100 
75 
50 
25 

20 40 60 80 100 

Figure 7: Upper bounds (^'^) of Bayes factors as a function of n for the priors vr"^ (Solid 
line), vr^^ (Dot-dashed line), vr^ (Dashed Line), and ir^ (Dots). 

Vn and for the three priors considered. When the evidence in favor of Mi is largest (that is, 
(y, (S) — > (/io,cro)) it can be seen (with a change of variables) that the Bayes factor in favor of 
Ml, grows to Bl2{n,'ir) 

Bl,in,7r) = y"/3-"exp{-ni±^^^^}vr^,(a,/3), 

a function only of n and the prior (j = A, F, S) used. For the arithmetic intrinsic prior and 
fractional priors, the mixing densities ni are: 

and for the sum DB prior: 

7rf(a,/3) = ^(l + /34 + /3V(l + /32))-\ / ^((1 + s^)(l + s2))-i/2 ^5. 

Figure [7] illustrates the rate at which B\2{n,TT) — > 00 as n ^ 00. It can be clearly seen that, as 
in the previous example, DB and intrinsic prior behave very similarly, being more sensitive to 
the evidence in favor Mi than the fractional prior, subtantially so unless n is very small. 

Finally we compare the behavior of the three priors in a real example taken from Montgomery 
(2001). The example refers to controlling the piston ring for an automotive engine production 
process. The process was considered to be in control if the mean and the standard deviation of 
the inside diameter (in millimeters) of the pistons were = 74.001 and ctq = 0.0099. At some 
specific time, the following sample was taken from the process: 

74.035, 74.010, 74.012, 74.015, 74.026 , 

and it had to be checked whether the process was in control. Bayes factors are given in Tabled 
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Bf^ provides about twice more support to Mi than and B^, which are very similar to each 
other. 

rS r>A R_F 

12 12 12 

0.004 0.005 0.011 " 
Table 4: Bayes factors B12 for Montgomery (2001) example. 



3.4 Irregular models (Example 4) 

There is an important class of models for which the parameter space is constrained by the data. 
These models do not have regular asymptotics and hence solutions based on asymptotic theory 
(like the Bayesian information criteria, BIC) do not apply. Moreover, these models are very 
challenging for the intrinsic approach; indeed, as discussed in Berger and Pericchi (2001), the 
fractional Bayes factor is completely unreasonable (and hence the fractional intrinsic prior is 
useless), and the arithmetic intrinsic prior (which was only derived for the one side problem) 
is "something of a conjecture" (authors' verbatim). We take here the simplest such models, 
namely an exponential distribution with unknown location. Accordingly, assume that 

f{y\e)=exp{-{y-9)}, y>e, 

and that it is wanted to test Hi : 9 = 9q vs. H2 : 9 ^ 9q. To the best of our knowledge, no 
objective priors have been proposed for this testing problem in the literature. 

In these situations, the sum-symmetrized kulback-Leibler divergence D^[9,9q] is 00, so we 
have to use the minimum. It can be checked that D^^[9, 9o] = 2\9 — 9o\, a well defined divergence. 
Also, vr^(0) = 1 since is a location parameter. The Minimum DB prior is then given by 

7r''{9) = ^{l + 2\9-9o\y'^\ 9en, 

which is symmetric with respect to (as expected, since is a location parameter); also, vr*^ 
has no moments. Figure E (left) shows -K^'^{9) when 9o = 0. 

We next investigate the evidence consistency for any n. The sufficient statistic is T = 
minjyi, . . . ,?/«}. It is trivially true that B12 0, as T ^ —00 for any (proper) prior (in fact, 
B12 = for T < 9q). The next lemma provides a sufficient condition on the prior to produce 
evidence consistency Vn, as T — > 00. 

Lemma 3.5. Let 'it{9) be any proper prior (on M2) and BI2 he the corresponding Bayes factor. 
If for some integer k > 

/•oo 

/ e''^iT{9)d9 = 00, (19) 

J do 
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then ~^ as T ^ oo Vn > k. 

Proof. See Appendix. □ 

It follows from the previous lemma that vr*^ produces evidence consistent Bayes factors 
Vn > 1. We next investigate the situation for increasing evidence in favor of Mi, that is, as 
T e+. Let 

BUn)= lim B^^. 

Bi2{n) is an upper bound of B12 when the evidence in favor of Mi is largest. It can be seen in 
Figure [8] (right) that B^2{''^) is nearly linear. Of course B^2{^) ~^ ^ when n —>■ 00. 

As mentioned before, there does not seem to be any other proposals in the literature for 
the two-side testing problem. However, Berger and Pericchi (2001), do consider the 'one side 
testing' version, namely testing Mi : 9 = 9q vs M2 : 6 > 9q; they conjecture that the arithmetic 
intrinsic prior for this problem is the proper density 

7r^(0) = ( - e^-^o iog(i _ e^o-e) _ ^ q ^ 

which is a decreasing and unbounded function of 9. Also, since We next compare the (minimum) 
DB prior for this problem with Berger and Pericchi proposal. 

Although our original formulation appears to be in terms of two side testing (see ([TJ) in 
reality it suffices to define appropriately to cover other testing situations. For instance, in our 
one-side testing, we take = [9q,oo). The (minimum) DB prior is 

7r^'{9) = {l + 2i9-9o)y'^\ 9 > 9^. 

It can be checked, that tt^ meets condition (|19p for k = \ and hence tt^ produces evidence 
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Figure 9: Irregular, one side testing problem: tt (solid) and vr (dots) for the case = 0. 





T 


^12 
Bl2 


0.02 0.05 0.10 0.20 0.50 1.00 

n = 10 

46.56 16.66 6.83 2.19 0.16 0.002 
11.54 5.16 2.57 1.02 0.10 0.001 




n = 20 

41.96 12.65 3.75 0.55 0.002 2 • lO"'^ 
10.52 4.04 1.50 0.28 0.002 2 • lO"'^ 



Table 5: Irregular models, one side testing. Values of B12 for different values of T, n and for 
the two priors 7r"^,7r*^, when testing 6*0 = 0. 



consistent Bayes factors Vn > 1. The priors ir^ and tt^ are displayed in Figure [9j We find that 
also in this example tt^^ has thicker tails. 

In this one side testing scenario (in sharp contrast to the behavior in the two-side testing) 
the Bayes factor in favor of Mi for every n > does grow to 00 as the evidence in favor of Mi 
grows. Indeed, the Bayes factor B12 is 

exp{n{e-eo)}7rie)dey\ 

so that, B12 00 when T 9q , Vn > 0, no matter what prior is used. Note that here 9o is in 
the boundary of the parameter space. 

In Table O we produce the Bayes factors computed with tt^ and vr^ when = for various 
values of T = minjyi, . . . ,yn}, and for n = 10 and n = 20. For small values of T (T < 0.20), 
when evidence supports Mi, B^2 is considerably larger than thus giving more support to 
Ml. For larger values of T (that is, when data contradict Mi) both priors result in very similar 
Bayes factors. 
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3.5 Mixture models (Example 5) 

Mixture models are among the most challenging scenarios for objective Bayesian methodology. 
These models have improper likelihoods, i.e., likelihoods for which no improper prior yields a 
finite marginal density (integrated likelihood). Recently, Perez and Berger (2001), have used 
expected posterior priors (see Perez and Berger, 2002) to derive objective estimation priors, but 
basically no general method seems to exist for deriving objective priors for testing with these 
models. 

However, the divergence measures are well defined (although the integrals are now more 
involved) providing a reasonable DB prior to be used in model selection. We consider a simple 
illustration. Assume 

/(y \l^,p)= pN{y I 0, 1) + (1 - p) N{y \ 1), 

and the testing of Hi : /i = 0, vs. H2 : /i 7^ 0, where p < 1 is known (if p = I, both 
hypotheses define the same model). As Berger and Pericchi (2001) point out, there is no minimal 
training sample for this problem and hence the intrinsic Bayes factor cannot be defined. The 
fractional Bayes factor does not exist either. The only prior we know for this problem is the 
recommendation in Berger and Pericchi (2001) of using -ir^^^fj.) = Ca(/i|0, 1). 

Although there is no formal tt^ {^) here, vr^(/i) = 1 is usually assumed (see for instance 
Perez and Berger, 2002). It can be shown that q^^ = 00, and hence, ■k'^^ does not exist. Let 

G(p, ^,,^i*)= r log [1 + eJ^'^-^'/^l N{y | 1) dy. (20) 

J-00 L p J 

Then 

D^[^x, /io] = n{l - p) {Gip, 1^, /x) - G{p, fi, 0)) . 

It can be shown that < 00, and hence that the sum DB prior tt^ exists. The normalizing 
constant, however, can not be derived in closed form. Numerical procedures could be used to 
exactly derive the sum-DB prior. We use instead a Laplace approximation (see Tanner 1996) to 
(I20p to get and approximate DB prior. Specifically 

G(p,;U,/x*)«log[l + i^e^>-'^V2l =G^{p,^,,^,*). (21) 
L p J 

Figure [To] shows G{p, ^, ix*) — G{p, and its approximation G^{p, ^, fi*) — G^{p, ^,0) foi p = .5 
and p = .75. The approximation is very good as long as p is not too extreme. 

We can now use this approximation to derive the DB prior. Note that the natural effective 
sample size here is n* = n(l — p), so that the unitary sum-symmetrized divergence is 
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Figure 10: G{p, ji, fi) — G{p, ji, 0) (solid) and its Laplace approximation G^{p, ji, fi) — G^{p, fi, 0) 
(dots). Left: p = 0.50. Right: p = 0.75. 




This approximation is specially appealing because it also keeps essential properties of the 
divergence measures. In particular, D^^{ii, ij,q) > D^^{iiq, hq) = 0, so that the approximate DB 
prior 

7r^^(//)oc(l + D^^(M,Mo))~'*, 
has a mode at zero. Since = 1/2, we finally get 

7r^^(/x)oc(l + 5^^(/x,Mo))~'. 

Interestingly, the prior vr'^^ is close to a Cauchy density, which was Berger and Pericchi 
proposal, although the scale differs. Indeed a Taylor expansion of order 3, around /U = gives 

^^^(//,Mo) ~ (l-p)/i', (22) 
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Figure 12: fo'^ ^ (solid line) and tt (dots) as a function of n, for p = 0.5. 

so that, unless p is very close to 1, vr"^^ behaves around as a Ca{fi | 0, 1/(1 — p)); the approx- 
imation is excellent when p is close to 0.5. In the tails, on the other hand, we have that, as 
lul —5- oo 

D^\^,f,o)^f^, (23) 

independently of p. Hence, the tails of it^^ are close to those of a Ca{^ \ 0, 2) density. Note 
that both approximations (|22p and (j23p coincide for p = 0.5. 

The scale of the Ca{ii \ 0,1/(1 — p)) makes intuitive sense. Indeed, the larger p, the less 
observations providing information about fi we get, and the DB prior adjust to a less informative 
likelihood by inflating its scale. Figure [TT] displays vr'^^, its Ca{fi \ 0,1/(1 — p)) approximation, 
and the proposal of Berger and Pericchi (2001) for different values of p. Notice that, for values 
of p close to 0, TT^^ (and its approximation Ca(0, 1/1 —p)) approximately behaves as a Ca(0, 1), 
the Berger and Pericchi proposal (see Figure [TTl right). This has an interesting interpretation 
since, as p ^ the testing problem in this example essentially coincides with that of testing 
Hi : fi = vs. H2 : fi ^ 0, when fi is the mean of a normal density, for which the Ca{fi \ 0, 1) is 
perhaps the most popular prior to be used as prior distribution for n under H2. 

In this example, the DB prior (as well as Berger and Pericchi proposal) again produces 
evidence consistent Bayes factors for all n. Indeed, it can be shown that if one of the y^s tends 
to 00 or —00, then the corresponding Bayes factor tends to no matter what prior is used. On 
the other hand, as the evidence for Hi increases, we get a finite upper bound on B12 for every 
fixed sample size n: 

Bl2{'n,p,TT) = lim B12. 

In Figure [T2] we show B12 for vr = tt^^ and vr = Ca{fj, \ 0, 1) as a function of n for p = 0.5. As 
in the previous examples, it is an immediate consequence that B^2i''^jPi'^) — > 00 as n — > cxo for 
both priors, but the support for Hi is larger when tt^^ is used for every n. 

In Table [6] we show the Bayes factors 3^2 , B'^2 ^^'^ ^12' computed respectively with the 
priors vr"^^, its Ca(/x | 0, 1/(1 — p)) approximation and the Ca(/x | 0, 1) proposed by Berger and 
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p ^ 0.25 






p = 0.5 






p = 0.75 








^12 




-°12 


-°12 


tjBP 
^12 


nSL 


^12 







5.49 


4.97 


4.39 


2.56 


2.56 


2.01 


2.37 


2.90 


1.87 


0.5 


1.82 


1.65 


1.49 


0.36 


0.36 


0.33 


1.69 


2.06 


1.42 


1 


0.07 


0.06 


0.06 


0.04 


0.04 


0.04 


0.01 


0.01 


0.01 



Table 6: Bayes factors B12 for simulated samples of size n = 20 from the mixture model with 
various values of p and ^ and the priors ir^^ , its approximation Ca{fi \ 0,1/(1 — p)) and 
7r^P{fi) = Ca{n I 0,1). 

Pericchi. Since reduction by sufficient statistic is not possible, the Bayes factors are computed 
for simulated samples of size n = 20, with mean ^ G {0,0.5,1}, and p G {0.25, .5, 0.75}. Bfi' 
and its approximation By^ are very close, demonstrating that the approximation is very good 
for the considered range of p. Bfi' and B^^^ are also very similar. 

4 Nuisance parameters 

In this section we deal with more realistic problems in which the distribution of the data is 
not fully specified under the null (simplest model), but depends on some nuisance parameter. 
Assume that i = 1, . . . ,n are independent (not necessarily i.i.d.) and that y = (yi, . . . , y„) ~ 
{f{y \ e,v), e e Q,u e T}. We want to test Hi : = Oq vs. H2 : ^ Oq. Equivalently 
we want to solve the model selection problem ^ where it is carefully acknowledged that v can 
have a different meanings in each model. 

However, from now on we assume, after suitable reparameterization if needed, that and u 
are orthogonal (that is, that Fisher information matrix is block diagonal). It is then customary 
to assume that v has the same meaning under both models (see Berger and Pericchi, 1996, for 
an asymptotic justification). This will be needed for the divergence measures to have intuitive 
meaning, and also to justify assessment of the same (possibly improper) prior for u under 
both models thus considerably simplifying the assessment task. The suitability of orthogonal 
parameters in the presence of model uncertainty was first exploited by Jeffreys (1961) and it has 
been successfully used by many others (see for example Zellner and Slow, 1980, 1984, and Clyde, 
DeSimone and Parmigiani, 1996). For univariate 6, Cox and Reid (1987) explicitly provide an 
orthogonal reparameterization. 

Accordingly, we assume that the hypothesis testing problem above is equivalent to that of 
choosing between the competing models: 

Ml : fi{y I ly) = f{y \ O^.u) vs. M2 : /sd/ \e,u) = f{y \ e,u), (24) 

where 0o £ © is a specified value, and u (the old parameter in Jeffrey's terminology) is assumed 
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to be common to both models, which only differ by the different value of the new parameter 9 
under M2. 

4.1 Divergence Measures 

The basic measure of discrepancy between and 6q is again Kullback-Leibler directed divergence 
^ where v is taken to be the same in both models: 

KL[{eo, u) : [6, u)] = [ (log f{y \e,u)- log f{y | Oo, iv)) f{y \ 6, u) dy. 

Jy 

Note that using the same u only makes intuitive sense if v has the same meaning under both 
models, and hence can be considered common. Actually, Perez (2005) using geometrical argu- 
ments, shows that under orthogonality KL[{6q,i>) : [O^v)] can be interpreted as a measure of 
divergence between /i and /2 due solely to the parameter of interest 0. This interpretation does 
not hold for other divergence measures, as the intrinsic loss divergence defined in Bernardo and 
Rueda (2002). 

Similarly to Section [2] we symmetrize Kullback-Leibler directed divergence by adding or 
taking the minimum of them, resulting in the sum-divergence and min-divergence measures 
between 6 and Oq for a given v 

D^[{e, 60) I u] = KL[{e, u) : {Oo, u)] + KL[(0o, i^) : {6, u)], (25) 

and 

D^[{e, eo)\u] = 2x mm{KL[{e, v) : (6>o, i^)], irL[(0o, 1^) : {9, v)]}. (26) 

D^^ is used by Perez (2005) to define what he calls the "orthogonal intrinsic loss" . 

In what follows, many of the definitions and properties apply to both and D^^ , in which 
case we again generically use D to denote any of them. Their basic properties were discussed 
in Section [2j As before, the building block of the DB prior is the unitary measure of divergence 
D = D/n* , where n* is the equivalent sample size for 0. 

4.2 DB priors in the presence of nuisance parameters 

For testing Hi : 6 = 6q vs. H2 : 6 Oq, or equivalently choosing between models Mi and M2 
in (|24p . we need priors ttiIv) under Mi and TT2{f,0) under M2. 

In the spirit of Jeffreys (and many others after him) we take (under each of the models) the 
same objective (possibly improper) prior for the common parameter i> and a proper prior for the 
conditional distribution of the new parameter 6 \ u under M2, which will be derived similarly 
to the DB priors in Section 12. 2[ Note that since v occurs in the two models, if we take the 
same ir^ {v) in both, then the (common) arbitrary constants cancel when computing the Bayes 
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factor; however 9 which only occurs in M2 has to have a proper prior. A common prior for the 
old parameter only makes sense when v has the same meaning in both models (another reason 
to take 6 and u orthogonal). Moreover, it is well known that under orthogonality, the specific 
common prior for v has little impact on the resulting Bayes factor (see Jeffreys 1961; Kass and 
Vaidyanathan 1992), thus supporting use of objective priors for common parameters. 

Let -K^ {v) be an objective (usually either Jeffreys or reference) prior for model /i and 
Tr^{0, v) the corresponding one for model /2 {0 is of interest if the reference prior is used). We 
define tt^ {6 \ v) such that 

7r^(6>,i/) = 7r^(6l | v)ti^{v). 

To define the DB priors, let D any of ([25|) or (p6]) (other appropriate divergence measures could 
also be explored). Then we define: 

Definition 4.1. (DB priors) Let c{q,u) = f (l + L»[(6>, 6>o) | u])~''TT^{e | t')de, and 

q = inf{ q >0 : c{q, u) < 00}, a.e.u £ q* = q + 1/2 

Ifq < 00, the D- divergence based prior under Mi zs7rf(^') = ■k^{u), and under M2 isiT2{0,i') = 
7r^{6 I u) TT^{u), where the (proper) 7r^{6 \ u) is 

^''{e\v) = c{q,,v)-' [i + D[{e,e^)\u]yTT''{e\v) . 

In this defintion we are implicitly using the reccomended non- increasing function hq{t) = 
(1 + 1)"'', but again other non-increasing functions on t € [0, 00) could be explored. 

Definition 4.2. (Sum and Minimum DB priors) The sum DB prior tt'^ and the minimum 
DB prior vr*^ are the DB priors given in definition \4-l\ with D being respectively ( see (j25p ) 
and Z)*^ (see ([26l) ). When needed, we refer to their corresponding c's and q's as cs,q^,qf, and 
cm,Q^,Q*^, respectively. 

We next investigate whether the DB priors are invariant under reparameterizations. Suppose 
that ^ = ^{6) and r] = r]{u) are, respectively one-to-one monotone mappings ^ : ^ 0^, 
r/ : T ^ T^. Clearly, the reparameterization {^,r]) preserves orthogonality. 

The original problem (j24p in this parameterization becomes: 

: fKy I V) = riy I ^0,^) vs. Ml : f^{y \tv) = fiv I tv), (27) 

where f*{y \ ^{0),r]{v)) = f{y \ 6,u) and = ^(^o)- We next show that if vr^(i/) and 
^^{O^u) are invariant under these reparameterizations, so are the DB priors. (See Datta and 
Ghosh, 1995 for a detailed analysis about the invariance of several non informative priors in the 
presence of nuisance parameters.) 
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Theorem 1. (Invariance under one-to-one transformations.) Let irjj^v) and T:^{ri) he either 
the sum or the minimum DB priors under Mi for the original (2^, and reparameterized ( (^7| j 
problems, respectively, and similar notation for ^^^{O ,u) and 7r^^{^,r]), under M2. IfTTy{v) = 
KTT^ {ri{u)) \ J'rj{iy)\, where k is a constant, and 7r^^{6,iy) oc vr^^(^(0), 77(1^)) \J'^^j^{0,u)\, then 

Proof. See Appendix. □ 

As a consequence, DB Bayes factors are not affected by reparameterizations of the type 
considered. These are the most natural and interesting reparameterizations of the problem (and 
indeed other reparameterizations seem questionable). Also, the DB priors are compatible with 
reduction by sufficiency in the same spirit as in Proposition [2j 



4.3 Examples 

We next demonstrate the behavior of DB priors and corresponding Bayes factors in a couple of 
examples. The first is testing the mean of a gamma model, a difficult problem in general. The 
second discusses linear models. 

4.3.1 Gamma model (Example 6) 

Let y = (2/1, ... , Un) be an iid sample from a Gamma model with mean fi, and shape parameter 
a, that is, from 

/(y|a,//) = (-)°^(a)-S"-'e-W^ 

It is desired to test Hi : fj. = f^o vs. H2 : fiQ. It is easy to show that /i is orthogonal to a. 

The objective (reference) priors are vr^ (a) = (tp^^^a)—!/ a)^^'^ and7r^(/i,a) = fj,~^{ilj^-^\a) — 
l/a)^/^, where ip^^^ represents the digamma function. Hence vr^ (/i | a) = fi^^. 

The DB priors are ■iT^{a) = 7r^(a), under both hypotheses and for D either the sum or min 
divergence. Under H2, the conditional sum-DB prior for /i is 

5/ I ^ -1^ ^^1 , ~ ^^o)^-l/2 1 
TT (u \ a) = c„ (a) 1 + a — 

where Cs{a) is the proportionality constant 

c,(a)= r {i + Jl^y'/'ldt. 

Jo t t 



28 





A = 


10 

nM 
-fc>12 




11 

-fc>12 


A = 

Bi2 


12 


a = 0.5 


12.94 


2.83 


0.005 


0.004 


MO"" 


3-10-'' 


a = l 


11.27 


2.92 


0.353 


0.150 


0.003 


0.003 


a = 2 


9.49 


3.06 


3.102 


1.136 


0.22 


0.12 



Table 7: Values of B12 for gamma mean testing with /io = 10; we use n = 10, and different 
values of (/i, a) . 



The conditional min-DB prior is 



7r*^(/i I a) = cm{a)-' (l + /io) | 0])-=^/'- 



where 



LVP,Pu; I J 2«noff - 1 + 



2a(logi;^-l + ^) if /i>/io 
2a(logM_i + iL) if ^<^o, 



and 



/•oo 

Cm{a) = 2 (1 + 2a(t - 1 + e"*)) dt. 
Jo 



In Table [7] we show the corresponding Bayes factors and S*| for n = 10; the null value 
is /io = 10, and we have considered several combinations of (/i, <t), the maximum likelihood 
estimates of the mean and standard deviation. When ft = 12 (casting doubt on the null), both 
Bayes factors are very similar, and increasing with a, an intuitive behavior. When the data 
shows the most support for the null, that is, when fi = 10, the Bayes factors differ, with the 
sum-DB prior giving the most support to the null. 

In contrast with DB priors, it is not possible to derive relatively simple expressions for the 
intrinsic priors. Hence, in this example, we compare the DB Bayes factors with the intrinsic 
arithmetic Bayes factor 13^2 (see Berger and Pericchi 1996). Although IB^ does not exactly 
correspond to a Bayes factor derived from a specific prior, it does asymptotically correspond 
to a Bayes factor derived with the intrinsic arithmetic prior. Since IB^ is not defined with 
reduction by sufficiency, the comparison are carried out for (specific) simulated samples with 
the given parameters. In Table [5] we show the arithmetic intrinsic and DB Bayes factors for 
testing Hi : /i = 10, with n = 10 and samples generated from Gamma distributions with 
fi € {10,11,12} and a £ {0.5,1.0,2.0}. The resulting MLEs (/i, cr) in lexicographical order 
are: {(10.02,0.52), (9.98,0.99), (9.98,1.97), (11.01,0.48), (11.00,0.99), (10.98,1.99), (11.99,0.51), 
(11.98,0.99), (12.01,1.99)}. When H2 is true (// = 11 or /i = 12), the three measures are rather 
close. Similar values are also obtained when the 'null' model Hi is true and a = 2. In all these 
cases, the three measures provide support to the true model. Nevertheless, when Hi is true 
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B\2 


= 10 


IBt2 


B\2 


^ = 11 


IBt2 


Bl2 


= 12 

dM 
t>12 




0- = 0.5 


13.17 


2.93 


0.08 


0.004 


0.003 


0.001 


1.4-10"'' 


3.7-10-'' 


O.l-lO"" 


a = l 


11.15 


2.88 


0.55 


0.33 


0.14 


0.07 


0.003 


0.003 


0.001 


a = 2 


9.57 


3.08 


3.71 


3.07 


1.12 


1.23 


0.22 


0.12 


0.07 



Tabic 8: For Gamma model problem, and test ifi : /x = 10 vs. H2 : jJi^ 10. In each cell, values 
of B12 and arithmetic intrinsic Bayes factor /-Bj^, associated with a sample of size n = 10, from 
a Gamma model with mean ji and standard deviation u. 

and the variance is small, the DB Bayes factors arc very sensible (with giving the largest 
support to the null) but the is not, giving support to H2. This behavior of IBy^ is likely 
due to the well known instability of IB^2 when the sample size is small (worsened in this case 
because the variance is small). 

4.3.2 Variable selection in linear models (Example 7). 

We briefly show next the motivating example for this paper; specifically we show how the 
DB prior reproduces Jeffreys-Zellner-Siow prior for variable selection in linear models. More 
elaborated examples of testing in linear models can be found in Bayarri and Garci'a-Donato 
(2007). Derivations of DB priors for random effects are given in Garci'a-Donato and Sun (2007). 

Consider the full rank General Linear Model {iV„(y | Xi/Si + XeP^, (^^In)} and the problem 
of testing Hi : f3^ = 0. After the usual orthogonal rcparamcterization (see e.g. Zellner and Slow 
1984) and taking n* = n and 7r^(/3i, /3e, a) = a~^, the DB priors are 

Trf (/3i,a) = a-\ n^{Pi,P„a) = a-'Cak^{l3, \ 0,n*a\V'V)-'), 

where ke is the dimension of /3g and 

V ={In-Pl)Xe, Pi=Xi{X\Xi)-^X\. 

Note that the exact matching of JZS and DB priors only occur if the effective sample size 
is n* = n. This 'coincidence' was the original motivation for the specific choice q + 1/2 in the 
definition of DB priors (see Garci'a-Donato, 2003 for details). However, n* might well depend 
on the design matrix (or covariates). For example, in the linear model Y = XO + e, with 
X : n X 1 and scalar, it is intuitively clear that if X = (1, ... , 1)* then n* should be n, but 
if X = (l,e, . . . ,e)* with e very small, then n* should be 1. The effective sample size defined 
in Berger et al. (2007) satisfies this requirement but other definitions might not. Extended 
investigation of this issue is beyond the scope of this paper and will be pursued elsewhere. 

Since comparison among existing objective Baycsian testing procedures for the Linear model 
have extensively been given in the literature, including Bayes factors derived with JZS priors. 
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we skip them here (see for example Berger, Ghosh and Mukhopadhyay, 2003; Liang et al., 2007; 
Bayarri and Garcia-Donato, 2007). 

5 Approximations and computation 

In this Section, we derive simple approximations to DB priors and show their connections with 
already existing proposal. We also exploit the connection between DB Bayes factors and a cor- 
rected Bayes factor computed with usual (possibly improper) non-informative priors to propose 
easy MCMC computation of DB Bayes factors. 

5.1 Approximated DB priors 

It is well known (see Kullback 1968; Schervish 1995) that the Kullback-Leibler divergence mea- 
sures can be approximated up to second order using the expected Fisher information, so that: 

D^[{6, Oo) \u]^{e- 6oY JeiOo, v) (6 - Oq) « Z^^[(0, Oq) \ u], 

where Jg{OQ,iy) is the block in Fisher information matrix corresponding to 6, evaluated at 
{dQ,u). Hence, for the problem (j24p (recall that 6 and u are orthogonal), the DB priors tt^ 
(either tt^ or tt^^) can be approximated by 7rf (i/) = 7r^(iv) and 

TT^iO I u) = c{q,,u)-' V ((^ - ^o)* ^^^^^ {0 - 0o)) TT^iO I u), (28) 

where now q^, = q + 1/2, and q is the infimum of q values for which the conditional defined in 
(j28|) (in terms of Fisher information) is proper. 

The cases when it^{0 \ v) does not depend on 6 (so behaves asymptotically as a location 
parameter) are specially interesting. It is easy to then show that q = k/2, where k is the 
dimension of 6 and hence 

^^(0 I v)^Cak{e I eo,n* J,\e^,v)) , (29) 

The conditional prior (j29p has been interpreted by many authors (see for instance Kass and 
Wasserman 1995) as the generalization of Jeffreys' ideas to multivariate problems. 

Moreover, if hq{t) = e~'^* is used instead, then vr^ would essentially be the normal unit 
information priors, as defined by Kass and Wasserman (1995) and further studied by Raftery 
(1998). Note that we have shown that this proposals can be interpreted as approximated DB 
priors only when 6 is asymptotically a location parameter. 
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5.2 Computation of Bayes factor 

Interestingly enough, and similarly to other objective Bayesian proposals (like the intrinsic and 
fractional Bayes factors), it can be shown that Bayes factors computed with DB priors, -621' 
can be expressed as an (invalid) Bayes factor computed with non-informative (usually improper) 
priors, i?^, multiplied by a correction factor. This expression also allows for easy computation 
of DB Bayes factors when B21 is easy to compute. 

Lemma 5.1. For problem \24\) (with 6 and v orthogonal), let denote the Bayes factor 
computed using tt^ {v) and 7r^(0,i/), then for both the sum and min DB-priors 

Bg = Bg X (c(g„^)-i V(5[(0,0o) I (30) 

Proof. See Appendix. □ 

Computation of B21 is often simpler than computation of proper Bayes factors. Then a 
sample (usually MCMC) from the posterior distribution t:^{6,u \ y) can be used to evaluate 
the expectation in ([30]) . thus considerably simplifying computation of Bf2 or B^^- This is 
actually how we computed the Bayes factors for Example 6 in Section 14.3. 1[ 

Moreover, if n is large (relative to the dimension of = {6, u), assumed fixed) we can approx- 
imate ([30|) using asymptotic expressions to posterior distribution along with the approximated 
DB priors given in ()28p . 

We illustrate the approach in a simple setting. First we assume that the asymptotic posterior 
distribution is given by (see conditions in e.g. Berger 1985), 

7r^(0,i.|y)«iV(0,J-i(0)), 

where = {0,i>) is the (assumed to exist) maximum likelihood estimate of {0,v) and J = 
Je © Ju is the (block diagonal) expected Fisher information matrix of f{y \ 0,iy). 

Next we assume that -k^ {0 \ u) does not depend on 0, so the approximating (conditional) 
DB prior is the Cauchy prior in (j29p . As a notational device, it will be convenient to then write 

{0 I u) as -K^ {6q I u). Expressing the Cauchy density (|29p in the usual way as a scale mixture 
of a Normal and an inverse gamma, and using the asymptotic posterior, the DB Bayes factors, 
as given in ([30]) . can be approximated by 

B^i "="^21 1 1 ^jV(gp I ^^ ^k{0 I Oo, ^{u, t)) Np{u I i>, J,{^)) du IGa{t \ i, i) dt, 

where p is the dimension of u and 'E{i',t) = tn Jg^{6(),u) + Jq^{4>). A similar asymptotic 
approximation to 5^2) finally gives the desired asymptotic approximation to the DB Bayes 
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factor: 



,D 

'21 



p{y I 0) 



1 



p{y 



detJeCcj))^''^ 



X 



// 



7r^(6> I u) 



Nk{e I 6>o,S(i/,i))Arp(iy I i>, 



{4>))IGa{t I -,-)dudt, 



which is very easy to evaluate by simple Monte Carlo. Note that arbitrary constants in the 
possibly improper -k^ {6 \ v) cancel out in the expression above. 

6 Summary and conclusions 

Extending pioneering work by Jeffreys (1961), we propose a new class of priors for objective 
Bayes hypothesis testing based on divergence measures, which we call 'Divergence Based' (DB) 
priors. For divergence measures, we propose use of symmetrized versions (sum and the minimum) 
of Kullback Liebler divergences. The resulting DB priors are usually easy to compute and have 
a number of desirable properties as invariance under rcparamctcrizations, evidence consistency 
and compatibility with sufficient statistics. Wc explore DB priors in a scries of estudy examples, 
in which they show to be intuitively sound and to produce sensible Bayes factors. This is so 
even for irregular models and improper likelihoods, which are extremely challenging scenarios 
for other objective Bayes testing methodologies. We recommend use of the sum-DB prior when 
it exists because it is considerably easier to compute than the min-DB prior and seems to exhibit 
a nicer behavior. 

The DB priors seem to behave similarly to the arithmetic intrinsic prior (when defined). 
Also, in normal scenarios, they exactly reproduce the proposals of Jeffreys (1961) and Zellner 
and Slow (1980, 1984), so that they can be considered an extension of these classical proposals to 
non-normal situations. Approximations to DB priors arc also shown to be connected with other 
proposals as the unit information priors. Finally, we also provide asymptotic approximations to 
DB Bayes factors for large sample size. 

The definition of DB priors are based on particular choices of both an 'objective prior' 
TT^ for estimation problems and 2) an equivalent sample size n* . Of course, there is no general 
agreement in the literature about a single definition for any of these concepts (and there might 
never be). We think that any sensible proposals would produce nice results, but this in an issue 
that needs to be further investigated. We recommend, when possible, use of the reference prior 
(Bcrgcr and Bernardo, 1992) and of the equivalent sample size in Berger et al. (2007). 

Other apparently arbitrary choices that we made were those of hq and of g*, however they 
were based on some compelling arguments 

• Choice of hq{t) = {l+t)~'^ was specifically chosen to reproduce in the normal case Jeffreys- 
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Zellner-Siow priors, but there are other reasons for it. A compeUing reason is that it is 
a simple function resulting in Bayes factors with nice properties; another simple function 
to use could be the exponential, but this results in normal priors that are not evidence 
consistent. Also, hg results in priors with very heavy tails, which is important so as not 
to 'knock-out' the likelihood when data is not well explained by the null model. However, 
we do not rule out that other choices of functions h{t) which are decreasing for t £ [0, oo), 
with maximum at zero, and producing proper DB-type priors could work better in specific 
scenarios. 

• Choice of q* = q + 1/2. In principle, any q+S could be used. As a matter of fact, we do not 
expect that the specific choice of 6 matters much as long as (5 G (0, 1) (needed to produce 
priors with heavy tails and no moments), but this again needs further investigation. We 
recommend use of = 1/2 because it is the value reproducing Jeffreys proposal. 
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Appendix. Proofs. 

Proof of Proposition [TJ Let Z)*[^,^o] be the unitary measure of divergence between fi{y) and 
I ^) in (jl4p . It is well known that KL remains the same under one-to-one reparameteri- 
zations, and clearly D*[$,{6),^,{6q)] = D[6,0q]. Now, by definition of DB priors, and using the 
relation between -Kq and 7r|^, it follows that 



Proof of Proposition [2j Let D*[9, 9q\ be the symmetric divergence between fi{t) and | 0) in 
p5|) . and hence D*[9,6q] = D[9,9q]. The result now follows from the assumption that neither 
j^Qj, ^* ^i^gj^gQ -^vhen the problem is formulated in terms of sufficient statistics. 

Proof of Lemma 13.31 First we show that (jlSp implies that ^ as y ^ 0. Assume /i^'^7r(/x) = 
oo. Then 

lim m2{y) = lim / fi~'"'e~'^^^^ Tr{fi)dfi > / fi^^'K{fi)dfi = oo, 
and the result follows. To show the converse, note that, since vr(/x) is proper, 

/•oo 

lim / ^-"e""^/'^ < oo. (31) 

y-*oJi 

Now, by contradiction suppose that for n > k, ^~^T{{^)d^ < oo, so in particular /i~"7r(^)(i/i < 
oo, and hence the limit ing function g{fi) = iJ,~""ir{fi) is integrable; now, the Dominated Conver- 
gence Theorem gives 

lim /"\-"e-"^/'^ ^(//) = [\-''TT{ij)dfi< oo, 
y^ojQ Jq 

which jointly with ([3T|) contradicts the assumption of ^ as y — > 0, proving the result. 
Proof of Lemma 13.51 It can easily be seen that, as T ^ oo 

/oo 
e''\{e)de, 
-oo 
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Now, Vn > /c, it follows that 

/oo poo fOO 

-oo J —oo J 9o 

proving the lemma. 



Proof of Theorem [H By definition, the DB priors for the reparameterized problem are vr^ (u) = 
iT^{u) and (recah hg{t) = (1 + 1)-") 

where D*[(^, ^g) | 77] is the corresponding unitary measure of divergence between the competing 
models /f and /| in ([27]) and 

c*{q.,v) = J V(^*[(^>^o) I V])4ir,i^ I V)dt 

It can be easily shown that ^*[(^,^o) I v] — ^[{6,0q) | u]. Also, under the assumptions of the 
theorem, ■k^^{6,i') = K2'K^^{^{6),'q{u)) \J^^ri{6 where K2 is a constant. Then 

and hence 

c(g*,i/) = — c*(g*,r/(i/)), 

and the result follows. 



Proof of Lemma 15.11 For i = 1,2, let mf{y) and [y) denote the prior predictive marginals 
obtained with irf and vr^, respectively. By definition of DB priors, •mf{y) = mf[y), and hence 

„D _ m§{y) _ {y) m§{y) _ „D "^^(y) 

mY{y) rn'{{y) m!^ [y) m'^ {y) 
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Finally 

m^iy) = j f{y\e,v)n''{e,u)dedv 

= J f{y\ e,u)c{q.,u)-^hqSD[{e,eo) I u])T,''{e,u)dedu 

= m^{y) j c{q,,u)-'hq^{D[{e,eo) \ i/])7r^(0,i^ | y)dedu, 

and the result holds. 
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